Manages the parsing, curation, and generation of synonyms for various ontologies, supporting external knowledge integration.
Components
Ontology Parsing & Core Logic
This central component encompasses the abstract base class for all ontology parsers and its concrete implementations. It defines the fundamental process of converting raw ontology data into a structured format, resolving linking candidates, managing data persistence, and orchestrating the overall ontology preprocessing pipeline, including interactions with other components for curation, synonym generation, and database population.
Referenced Source Code
Ontology Resource Management
This component handles the loading, dumping, processing, and conflict resolution of ontology string resources. It includes functionalities for analyzing various types of conflicts (e.g., case, normalization), applying autofixes, merging resource sets, and generating comprehensive reports on the state and integrity of these resources.
Ontology Data Acquisition
This component is responsible for downloading ontology data from various external sources. It includes different downloader implementations tailored for specific ontology formats and provides mechanisms for managing proxy settings and caching downloaded files.
Synonym Generation Strategies
This component provides a suite of strategies for generating additional synonyms from existing ontology terms. These strategies range from simple string replacements and combinatorial expansions to more complex token-based and verb phrase variant generation, often leveraging NLP models. It also includes automated curation actions that can modify or generate synonyms.
KRT Ontology Tools
This component represents the functionalities within the Kazu Resource Tool (KRT) that are dedicated to managing and updating ontologies. It includes user interface components for displaying update forms, managing downloader configurations, and utilities for generating and analyzing ontology upgrade reports, as well as managing resource conflicts within the KRT environment.
In-Memory Data Stores
This component provides efficient in-memory databases for storing and retrieving ontology metadata and synonyms. The MetadataDatabase holds descriptive information about ontology IDs, while the SynonymDatabase stores normalized synonyms and their associated linking candidates, enabling fast lookups during NLP processing.
Core Ontology Data Models
This component defines the fundamental data structures that represent various aspects of ontology information within the KAZU system. These models encapsulate concepts such as curated string resources, potential linking candidates, sets of equivalent IDs, global parser actions, and reports on ontology upgrades.
Referenced Source Code
Text Processing Utilities
This component provides a collection of utility functions essential for text manipulation and processing within the KAZU system. This includes string normalization, classification of symbolic strings, general path handling, conversion between data models, grouping functionalities, and string similarity scoring, often leveraging external NLP libraries like spaCy.